Skip to content

Conversation

@richardstartin
Copy link
Collaborator

…t (seed dependent behaviour)

@richardstartin richardstartin mentioned this pull request Dec 23, 2019
@richardstartin
Copy link
Collaborator Author

In the blocked bloom filters was it intended that the same key would never be mapped to the same word (they now do whenever the upper 4 bits of the hash are zero), hence the plus 1? Should the solution be to increase (double?) the size of the bitset instead?

Copy link
Contributor

@thomasmueller thomasmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments... Unfortunately I don't have much time currently to work on this, but I will give you access rights so you can commit yourself once the issues are addressed.

public class BlockedBloom implements Filter {

public static BlockedBloom construct(long[] keys, int bitsPerKey) {
long n = keys.length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using long doesn't make sense here... I used long in some places to avoid integer overflow, but here it's not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where you can reason that overflow is impossible, accumulating into a long can cause significant degradation in throughput once the loop has been compiled by C2. I can share some benchmarks which demonstrate this. Of course, sometimes it's the only safe thing to do...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wasn't aware of this... I will keep this in mind!

}
int si = 0;
while (si < 2 * keys.length) {
while (si < 2 * keys.length && qi > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XorSimple is (on purpose) the most simple possible implementation. It is only for educational purposes. It shouldn't have any performance improvements. But of course it shouldn't have any bugs either... So not sure if this is really needed?

Copy link
Collaborator Author

@richardstartin richardstartin Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug where qi goes negative in the loop, rather than fail to map. This occurred for the first time with the seed in the regression tests, but cannot be reproduced because of the unseeded calls to new Random().nextLong() (which #13 aims to fix). Basically there do exist cases where this loop will access Q[-1] rather than fail to map and try again, but it's nondeterministic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you omit this change from your PR and, instead, report it as a bug. It seems likely to me that your fix (qi > 0) is just patching over a logical error of some kind. It would be better to identify the logical error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will do that.

@richardstartin
Copy link
Collaborator Author

@thomasmueller I sensed this wasn't the correct fix for the blocked bloom filters so will address that.

@richardstartin
Copy link
Collaborator Author

@lemire do you have any time to look at this?

@lemire
Copy link
Member

lemire commented Dec 26, 2019

@richardstartin See my proposal. I think that only one change you are making is somewhat controversial in that it might be hiding a real bug instead of exposing it. I recommend you omit this change and then I think that the whole PR could be safely merged without any controversy.

@thomasmueller
Copy link
Contributor

I would love to give you more feedback, but I'm afraid I have very little time now... Next year should be better.

@richardstartin
Copy link
Collaborator Author

richardstartin commented Dec 27, 2019

I will leave the branch like this - there are several false negative bugs which aren't trivial to fix, and I've reverted the change to XorSimple. SimpleFuzzer will stop crashing on XorSimple when/if it gets fixed.

@lemire
Copy link
Member

lemire commented Dec 27, 2019

@thomasmueller

I recommend merging this PR. Do you agree?

@thomasmueller
Copy link
Contributor

I think it would break the build, see https://travis-ci.com/FastFilter/fastfilter_java/builds/142464328

What about disabling the tests for now, so the build passes, and create issues for the problems? That way I think merging is fine.

@richardstartin
Copy link
Collaborator Author

You shouldn't merge the PR, I've left it to break to highlight the existence of various bugs which need fixing or otherwise removing.

@lemire
Copy link
Member

lemire commented Dec 27, 2019

Ok.

My view was that it does not really break anything, it merely exposes existing problems and thus is safe to merge. But I see that Richard has opened distinct bug issues.

@richardstartin
Copy link
Collaborator Author

@lemire I don't think there is a rush to merge it. I was keen to help get the project to the point where it could be released and used, but there are several nondeterministic bugs I don't have enough context to be able to fix, so my goal may have been a little premature. The fixes which are made in this PR are trivial, and merging the branch will just break master. It's worth leaving around for now to verify that the harder fixes have been made.

@lemire
Copy link
Member

lemire commented Dec 27, 2019

Ok.

@thomasmueller thomasmueller merged commit 9b6c6d8 into FastFilter:master Dec 27, 2019
@thomasmueller
Copy link
Contributor

I merged the patch, as it improves the code a lot. I will try to resolve the remaining issues.

One option is to move the buggy implementations (specially MPHF, GCS2) to the "test" source folder, until they are fixed (kind of like "staging").

@richardstartin
Copy link
Collaborator Author

@thomasmueller I think moving them to test makes sense, whilst there are completely new filter implementations, the availability of some filters for evaluation purposes only could be confusing for users.

@lemire
Copy link
Member

lemire commented Dec 27, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants